Name: team_bbg
Size: 2
Details:
| Member Name | Member NetId |
|---|---|
| Pushpit Saxena | pushpit2 |
| Venslaus Prakash Arokiaraj | vpa2 |
This project will mainly focus on studying different factors that play statistically significant role in influencing Life Expectancy. Some of the factors we will be focussing on are economic factors, social factors, health services factors (like immunizzation levels), mortality rate and various other health related factors. We will be building different multiple linear regression models and will try to apply some of concepts that we have learned as part of this course (STAT 420) to analyze and find the appropriate models for predicting life expectancy.
Based on the description of the dataset on kaggle, the Global Health Observatory(GHO) data repository under World Health Organization (WHO) keeps track of the health status as well as many other related factors for all countries The datasets are made available to public for the purpose of health data analysis. This datset was collected from WHO and United Nations website and then the individual data files have been combined into a single data set (read more here)
The dataset we will be using for this project is Life Expectancy data that can be found at Life Expectancy (WHO). The dataset has 22 variables and 2939 observations which needs some cleanup. (Note: we have also provided the dataset as part of the .zip [lifeExpectancyData] that we have uploaded for this proposal).
Following are some of the important variables used in this dataset:
Country (String): Country of observation
Year (Integer): Year of observation
Status (String): Whether the country of observation is developed or developing.
Life expectancy (Decimal): Life expectancy in age
Adult Mortality (Integer): Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population)
Infant deaths (Integer): Number of Infant Deaths per 1000 population
Alcohol (Decimal): Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol)
Percentage Expenditure (Decimal): Expenditure on health as a percentage of Gross Domestic Product per capita(%)
Hepatitis B (Int): Hepatitis B (HepB) immunization coverage among 1-year-olds (%)
Measles (Int): Measles - number of reported cases per 1000 population
BMI (Decimal): Average Body Mass Index of entire population
Under-five deaths (Int): Number of under-five deaths per 1000 population
Polio (Int): Polio (Pol3) immunization coverage among 1-year-olds (%)
Total expenditure (Decimal): General government expenditure on health as a percentage of total government expenditure (%)
Diphtheria (Int): Diphtheria tetanus toxoid and pertussis (DTP3) immunization coverage among 1-year-olds (%)
HIV/AIDS (Decimal): Deaths per 1 000 live births HIV/AIDS (0-4 years)
GDP (Decimal): Gross Domestic Product per capita (in USD)
Population (Int): Population of the country
We have attached the data file in the .zip [lifeExpectancyData] .
If needed, this dataset can also be downloaded from kaggle
From the standpoint of our research and learning, one of the primary reason for us to pick this dataset, is the size as well as variety of predictors that are available. We believe that this dataset is perfect for us to practice and implement majority of the techniques we have learned as part of the course and get a hands-on experience on a real life dataset.
From larger perspective of data exploration and using data science to address real world issues, this dataset gives us an opportunity to try and answer some of the most important questions the human race is facing, like various factors affecting the longevity of life. As we briefly mentioned in our project description, we are interested in determining different factors which contributes to lower the value of life expectancy. Particularly, in this dataset as the observations are based on different countries, if we are able to find a good model than we can answer questions like what a country needs to focus on in order to have better life expectancy.
## # A tibble: 2,938 x 22
## Country Year Status Life.expectancy Adult.Mortality infant.deaths Alcohol
## <chr> <int> <chr> <dbl> <int> <int> <dbl>
## 1 Afghan… 2015 Devel… 65 263 62 0.01
## 2 Afghan… 2014 Devel… 59.9 271 64 0.01
## 3 Afghan… 2013 Devel… 59.9 268 66 0.01
## 4 Afghan… 2012 Devel… 59.5 272 69 0.01
## 5 Afghan… 2011 Devel… 59.2 275 71 0.01
## 6 Afghan… 2010 Devel… 58.8 279 74 0.01
## 7 Afghan… 2009 Devel… 58.6 281 77 0.01
## 8 Afghan… 2008 Devel… 58.1 287 80 0.03
## 9 Afghan… 2007 Devel… 57.5 295 82 0.02
## 10 Afghan… 2006 Devel… 57.3 295 84 0.03
## # … with 2,928 more rows, and 15 more variables: percentage.expenditure <dbl>,
## # Hepatitis.B <int>, Measles <int>, BMI <dbl>, under.five.deaths <int>,
## # Polio <int>, Total.expenditure <dbl>, Diphtheria <int>, HIV.AIDS <dbl>,
## # GDP <dbl>, Population <dbl>, thinness..1.19.years <dbl>,
## # thinness.5.9.years <dbl>, Income.composition.of.resources <dbl>,
## # Schooling <dbl>
Life Expectancy (response):## [1] 65.0 59.9 59.9 59.5 59.2 58.8 58.6 58.1 57.5 57.3
Loading the Data:
raw_data <- read.csv("LifeExpectancyData.csv")
# Added Continent
raw_data$Continent <- countrycode(sourcevar = raw_data[, "Country"],
origin = "country.name",
destination = "continent")
# Added Region
raw_data$region <- countrycode(sourcevar = raw_data[, "Country"],
origin = "country.name",
destination = "region")Changing the names of the fields to follow a more consistent pattern(snake-case):
col_names <- tolower(trimws(str_replace_all(colnames(raw_data), "\\.+", "_")))
# col_names <- tolower(str_replace_all(colnames(raw_data), "\\s+", ""))
colnames(raw_data) <- col_namesSnippet of the raw dataset:
## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 2,938 x 24
## country year status life_expectancy adult_mortality infant_deaths alcohol
## <fct> <fct> <fct> <dbl> <int> <int> <dbl>
## 1 Afghan… 2015 Devel… 65 263 62 0.01
## 2 Afghan… 2014 Devel… 59.9 271 64 0.01
## 3 Afghan… 2013 Devel… 59.9 268 66 0.01
## 4 Afghan… 2012 Devel… 59.5 272 69 0.01
## 5 Afghan… 2011 Devel… 59.2 275 71 0.01
## 6 Afghan… 2010 Devel… 58.8 279 74 0.01
## 7 Afghan… 2009 Devel… 58.6 281 77 0.01
## 8 Afghan… 2008 Devel… 58.1 287 80 0.03
## 9 Afghan… 2007 Devel… 57.5 295 82 0.02
## 10 Afghan… 2006 Devel… 57.3 295 84 0.03
## # … with 2,928 more rows, and 17 more variables: percentage_expenditure <dbl>,
## # hepatitis_b <int>, measles <int>, bmi <dbl>, under_five_deaths <int>,
## # polio <int>, total_expenditure <dbl>, diphtheria <int>, hiv_aids <dbl>,
## # gdp <dbl>, population <dbl>, thinness_1_19_years <dbl>,
## # thinness_5_9_years <dbl>, income_composition_of_resources <dbl>,
## # schooling <dbl>, continent <fct>, region <fct>
Summary of numeric fields:
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s | |
|---|---|---|---|---|---|---|---|
| life_expectancy | 36.30 | 63.10 | 72.10 | 69.22 | 75.70 | 89.00 | 10.0 |
| adult_mortality | 1.00 | 74.00 | 144.00 | 164.80 | 228.00 | 723.00 | 10.0 |
| infant_deaths | 0.00 | 0.00 | 3.00 | 30.30 | 22.00 | 1800.00 | 0.0 |
| alcohol | 0.01 | 0.88 | 3.76 | 4.60 | 7.70 | 17.87 | 194.0 |
| percentage_expenditure | 0.00 | 4.69 | 64.91 | 738.25 | 441.53 | 19479.91 | 0.0 |
| hepatitis_b | 1.00 | 77.00 | 92.00 | 80.94 | 97.00 | 99.00 | 553.0 |
| measles | 0.00 | 0.00 | 17.00 | 2419.59 | 360.25 | 212183.00 | 0.0 |
| bmi | 1.00 | 19.30 | 43.50 | 38.32 | 56.20 | 87.30 | 34.0 |
| under_five_deaths | 0.00 | 0.00 | 4.00 | 42.04 | 28.00 | 2500.00 | 0.0 |
| polio | 3.00 | 78.00 | 93.00 | 82.55 | 97.00 | 99.00 | 19.0 |
| total_expenditure | 0.37 | 4.26 | 5.76 | 5.94 | 7.49 | 17.60 | 226.0 |
| diphtheria | 2.00 | 78.00 | 93.00 | 82.32 | 97.00 | 99.00 | 19.0 |
| hiv_aids | 0.10 | 0.10 | 0.10 | 1.74 | 0.80 | 50.60 | 0.1 |
| gdp | 1.68 | 463.94 | 1766.95 | 7483.16 | 5910.81 | 119172.74 | 448.0 |
| population | 34.00 | 195793.25 | 1386542.00 | 12753375.12 | 7420359.00 | 1293859294.00 | 652.0 |
| thinness_1_19_years | 0.10 | 1.60 | 3.30 | 4.84 | 7.20 | 27.70 | 34.0 |
| thinness_5_9_years | 0.10 | 1.50 | 3.30 | 4.87 | 7.20 | 28.60 | 34.0 |
| income_composition_of_resources | 0.00 | 0.49 | 0.68 | 0.63 | 0.78 | 0.95 | 167.0 |
| schooling | 0.00 | 10.10 | 12.30 | 11.99 | 14.30 | 20.70 | 163.0 |
We can see that only 10 observations have missing values for the response field life_expectancy, so we drop those 10 observations as dropping them will not make much difference to the models that we will try.
## [1] 2928
There are still 1279 observations with some missing values. We will use the mean of the value for a given country to impute some of these values:
new_df <- mod_data_df %>% group_by(country) %>% mutate_if(is.numeric,
function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
nrow(new_df[!complete.cases(new_df),])## [1] 800
Still there are some observations with missing values. Next we will use the mean of the values for a given region in a particular year to impute some of these missing values:
cleaned_df <- as.data.frame(new_df %>% group_by(region, year) %>% mutate_if(is.numeric,
function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x)) %>% ungroup)
cleaned_df$region <- as.factor(cleaned_df$region)
cleaned_df$year <- as.factor(cleaned_df$year)
nrow(cleaned_df[!complete.cases(cleaned_df),])## [1] 0
Finally, we have imputed all the values and our final dataset has 2928 observations
| Region | #Records | Avg. Life Expectancy | Avg. Infant Deaths | Avg. Adult Deaths |
|---|---|---|---|---|
| East Asia & Pacific | 422 | 71.34231 | 25.265403 | 137.62260 |
| Europe & Central Asia | 770 | 75.95456 | 2.724675 | 109.26432 |
| Latin America & Caribbean | 498 | 73.07319 | 7.339357 | 135.32661 |
| Middle East & North Africa | 320 | 73.16312 | 11.281250 | 105.65625 |
| North America | 32 | 79.87500 | 14.093750 | 61.40625 |
| South Asia | 128 | 67.37422 | 250.039062 | 164.50781 |
| Sub-Saharan Africa | 768 | 57.08685 | 47.593750 | 283.07812 |
## continent count mean_life_expectancy mean_infant_deaths mean_adult_deaths
## 1 Africa 864 57.80 44.246528 266.57176
## 2 Americas 530 73.90 7.747170 130.84659
## 3 Asia 752 72.55 60.875000 133.43750
## 4 Europe 626 77.80 1.172524 98.01282
## 5 Oceania 166 69.40 1.120482 135.08750
library(ggplot2)
ggplot(data = dd, aes(x = continent, y = mean_life_expectancy)) + geom_bar(stat = "identity")## Warning: Removed 10 rows containing non-finite values (stat_boxplot).
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## Warning: package 'ggcorrplot' was built under R version 4.0.2
cor_mat <- round(cor(na.omit(raw_data[, !(names(raw_data) %in% c("year",
"country", "continent", "status", "region"))])), 1)
cor_mat## life_expectancy adult_mortality infant_deaths
## life_expectancy 1.0 -0.7 -0.2
## adult_mortality -0.7 1.0 0.0
## infant_deaths -0.2 0.0 1.0
## alcohol 0.4 -0.2 -0.1
## percentage_expenditure 0.4 -0.2 -0.1
## hepatitis_b 0.2 -0.1 -0.2
## measles -0.1 0.0 0.5
## bmi 0.5 -0.4 -0.2
## under_five_deaths -0.2 0.1 1.0
## polio 0.3 -0.2 -0.2
## total_expenditure 0.2 -0.1 -0.1
## diphtheria 0.3 -0.2 -0.2
## hiv_aids -0.6 0.6 0.0
## gdp 0.4 -0.3 -0.1
## population 0.0 0.0 0.7
## thinness_1_19_years -0.5 0.3 0.5
## thinness_5_9_years -0.5 0.3 0.5
## income_composition_of_resources 0.7 -0.4 -0.1
## schooling 0.7 -0.4 -0.2
## alcohol percentage_expenditure hepatitis_b
## life_expectancy 0.4 0.4 0.2
## adult_mortality -0.2 -0.2 -0.1
## infant_deaths -0.1 -0.1 -0.2
## alcohol 1.0 0.4 0.1
## percentage_expenditure 0.4 1.0 0.0
## hepatitis_b 0.1 0.0 1.0
## measles -0.1 -0.1 -0.1
## bmi 0.4 0.2 0.1
## under_five_deaths -0.1 -0.1 -0.2
## polio 0.2 0.1 0.5
## total_expenditure 0.2 0.2 0.1
## diphtheria 0.2 0.1 0.6
## hiv_aids 0.0 -0.1 -0.1
## gdp 0.4 1.0 0.0
## population 0.0 0.0 -0.1
## thinness_1_19_years -0.4 -0.3 -0.1
## thinness_5_9_years -0.4 -0.3 -0.1
## income_composition_of_resources 0.6 0.4 0.2
## schooling 0.6 0.4 0.2
## measles bmi under_five_deaths polio
## life_expectancy -0.1 0.5 -0.2 0.3
## adult_mortality 0.0 -0.4 0.1 -0.2
## infant_deaths 0.5 -0.2 1.0 -0.2
## alcohol -0.1 0.4 -0.1 0.2
## percentage_expenditure -0.1 0.2 -0.1 0.1
## hepatitis_b -0.1 0.1 -0.2 0.5
## measles 1.0 -0.2 0.5 -0.1
## bmi -0.2 1.0 -0.2 0.2
## under_five_deaths 0.5 -0.2 1.0 -0.2
## polio -0.1 0.2 -0.2 1.0
## total_expenditure -0.1 0.2 -0.1 0.1
## diphtheria -0.1 0.2 -0.2 0.6
## hiv_aids 0.0 -0.2 0.0 -0.1
## gdp -0.1 0.3 -0.1 0.2
## population 0.3 -0.1 0.7 0.0
## thinness_1_19_years 0.2 -0.5 0.5 -0.2
## thinness_5_9_years 0.2 -0.6 0.5 -0.2
## income_composition_of_resources -0.1 0.5 -0.1 0.3
## schooling -0.1 0.6 -0.2 0.4
## total_expenditure diphtheria hiv_aids gdp
## life_expectancy 0.2 0.3 -0.6 0.4
## adult_mortality -0.1 -0.2 0.6 -0.3
## infant_deaths -0.1 -0.2 0.0 -0.1
## alcohol 0.2 0.2 0.0 0.4
## percentage_expenditure 0.2 0.1 -0.1 1.0
## hepatitis_b 0.1 0.6 -0.1 0.0
## measles -0.1 -0.1 0.0 -0.1
## bmi 0.2 0.2 -0.2 0.3
## under_five_deaths -0.1 -0.2 0.0 -0.1
## polio 0.1 0.6 -0.1 0.2
## total_expenditure 1.0 0.1 0.0 0.2
## diphtheria 0.1 1.0 -0.1 0.2
## hiv_aids 0.0 -0.1 1.0 -0.1
## gdp 0.2 0.2 -0.1 1.0
## population -0.1 0.0 0.0 0.0
## thinness_1_19_years -0.2 -0.2 0.2 -0.3
## thinness_5_9_years -0.2 -0.2 0.2 -0.3
## income_composition_of_resources 0.2 0.3 -0.2 0.4
## schooling 0.2 0.4 -0.2 0.5
## population thinness_1_19_years
## life_expectancy 0.0 -0.5
## adult_mortality 0.0 0.3
## infant_deaths 0.7 0.5
## alcohol 0.0 -0.4
## percentage_expenditure 0.0 -0.3
## hepatitis_b -0.1 -0.1
## measles 0.3 0.2
## bmi -0.1 -0.5
## under_five_deaths 0.7 0.5
## polio 0.0 -0.2
## total_expenditure -0.1 -0.2
## diphtheria 0.0 -0.2
## hiv_aids 0.0 0.2
## gdp 0.0 -0.3
## population 1.0 0.3
## thinness_1_19_years 0.3 1.0
## thinness_5_9_years 0.3 0.9
## income_composition_of_resources 0.0 -0.5
## schooling 0.0 -0.5
## thinness_5_9_years
## life_expectancy -0.5
## adult_mortality 0.3
## infant_deaths 0.5
## alcohol -0.4
## percentage_expenditure -0.3
## hepatitis_b -0.1
## measles 0.2
## bmi -0.6
## under_five_deaths 0.5
## polio -0.2
## total_expenditure -0.2
## diphtheria -0.2
## hiv_aids 0.2
## gdp -0.3
## population 0.3
## thinness_1_19_years 0.9
## thinness_5_9_years 1.0
## income_composition_of_resources -0.4
## schooling -0.5
## income_composition_of_resources schooling
## life_expectancy 0.7 0.7
## adult_mortality -0.4 -0.4
## infant_deaths -0.1 -0.2
## alcohol 0.6 0.6
## percentage_expenditure 0.4 0.4
## hepatitis_b 0.2 0.2
## measles -0.1 -0.1
## bmi 0.5 0.6
## under_five_deaths -0.1 -0.2
## polio 0.3 0.4
## total_expenditure 0.2 0.2
## diphtheria 0.3 0.4
## hiv_aids -0.2 -0.2
## gdp 0.4 0.5
## population 0.0 0.0
## thinness_1_19_years -0.5 -0.5
## thinness_5_9_years -0.4 -0.5
## income_composition_of_resources 1.0 0.8
## schooling 0.8 1.0
Splitting the data in training and test set (90% training, 10% hold out test set):
set.seed(19851115)
le_trn_data_idx <- sample(nrow(cleaned_df), size = trunc(0.90 * nrow(cleaned_df)))
le_trn_data <- cleaned_df[le_trn_data_idx, ]
le_tst_data <- cleaned_df[-le_trn_data_idx, ]Ignoring all the categorical variables for now (except status, we have fitted models using some of these categorical variables but couldn’t get better results, code can be seen in Appendix)
We started with fitting a full Additive model (with all the numerical predictor and status). This will provide us with a good baseline model to do simple as well as more nuanced feature selections later
##
## Call:
## lm(formula = life_expectancy ~ ., data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.6482 -2.2808 -0.1263 2.2784 17.4919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.521e+01 6.664e-01 82.838 < 2e-16 ***
## statusDeveloping -1.243e+00 2.834e-01 -4.386 1.20e-05 ***
## adult_mortality -1.818e-02 8.274e-04 -21.975 < 2e-16 ***
## infant_deaths 9.078e-02 8.753e-03 10.371 < 2e-16 ***
## alcohol 3.047e-02 2.696e-02 1.130 0.2585
## percentage_expenditure 1.562e-04 7.757e-05 2.014 0.0441 *
## hepatitis_b -2.051e-03 4.082e-03 -0.502 0.6155
## measles -1.497e-05 7.809e-06 -1.917 0.0553 .
## bmi 3.719e-02 5.170e-03 7.192 8.27e-13 ***
## under_five_deaths -6.775e-02 6.406e-03 -10.575 < 2e-16 ***
## polio 2.547e-02 4.731e-03 5.384 7.92e-08 ***
## total_expenditure 1.127e-02 3.447e-02 0.327 0.7437
## diphtheria 3.305e-02 5.048e-03 6.547 7.04e-11 ***
## hiv_aids -4.746e-01 1.777e-02 -26.709 < 2e-16 ***
## gdp 2.965e-05 1.195e-05 2.481 0.0132 *
## population 6.525e-10 1.900e-09 0.343 0.7313
## thinness_1_19_years -7.204e-02 4.970e-02 -1.449 0.1473
## thinness_5_9_years -7.721e-03 4.899e-02 -0.158 0.8748
## income_composition_of_resources 6.297e+00 6.552e-01 9.611 < 2e-16 ***
## schooling 7.378e-01 4.511e-02 16.355 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.959 on 2615 degrees of freedom
## Multiple R-squared: 0.828, Adjusted R-squared: 0.8267
## F-statistic: 662.3 on 19 and 2615 DF, p-value: < 2.2e-16
alcohol, if we use t-test for significance:
alcohol does not have significant linear relationship with life_expectancySo we started with simple (not recommended) method of removing some of the least significant predictors. Also, there seems to be high collinearity between infant_deaths and under_5_deaths (check vif below and correlation plot shown earlier).
## status adult_mortality
## 1.949467 1.756053
## infant_deaths alcohol
## 165.233443 1.982791
## percentage_expenditure hepatitis_b
## 4.085965 1.691643
## measles bmi
## 1.372574 1.795997
## under_five_deaths polio
## 165.237301 2.038419
## total_expenditure diphtheria
## 1.202812 2.389284
## hiv_aids gdp
## 1.396273 4.412362
## population thinness_1_19_years
## 1.555639 8.034095
## thinness_5_9_years income_composition_of_resources
## 8.115884 3.156427
## schooling
## 3.775685
So we removed some of the least significant predictor and kept infant_deaths
sig_additive_model <- lm(life_expectancy ~ adult_mortality +
infant_deaths + bmi + diphtheria + hiv_aids + gdp +
income_composition_of_resources * status + schooling,
data = non_cat_predictor_df)
summary(sig_additive_model)##
## Call:
## lm(formula = life_expectancy ~ adult_mortality + infant_deaths +
## bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources *
## status + schooling, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.9816 -2.2703 -0.0947 2.4075 18.9792
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 4.604e+01 3.066e+00 15.017
## adult_mortality -1.860e-02 8.439e-04 -22.047
## infant_deaths -2.677e-03 7.299e-04 -3.667
## bmi 4.484e-02 4.994e-03 8.979
## diphtheria 5.473e-02 3.811e-03 14.360
## hiv_aids -4.908e-01 1.806e-02 -27.176
## gdp 4.063e-05 7.465e-06 5.442
## income_composition_of_resources 1.669e+01 3.706e+00 4.505
## statusDeveloping 6.624e+00 3.040e+00 2.179
## schooling 7.792e-01 4.482e-02 17.385
## income_composition_of_resources:statusDeveloping -9.567e+00 3.631e+00 -2.635
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## adult_mortality < 2e-16 ***
## infant_deaths 0.00025 ***
## bmi < 2e-16 ***
## diphtheria < 2e-16 ***
## hiv_aids < 2e-16 ***
## gdp 5.74e-08 ***
## income_composition_of_resources 6.94e-06 ***
## statusDeveloping 0.02944 *
## schooling < 2e-16 ***
## income_composition_of_resources:statusDeveloping 0.00846 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.076 on 2624 degrees of freedom
## Multiple R-squared: 0.817, Adjusted R-squared: 0.8163
## F-statistic: 1171 on 10 and 2624 DF, p-value: < 2.2e-16
sig_interative_model <- lm(life_expectancy ~ (adult_mortality +
under_five_deaths + bmi + diphtheria + hiv_aids + gdp +
income_composition_of_resources + schooling) ^ 2 ,
data = non_cat_predictor_df)
summary(sig_interative_model)##
## Call:
## lm(formula = life_expectancy ~ (adult_mortality + under_five_deaths +
## bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources +
## schooling)^2, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.8667 -2.0800 -0.0783 2.0935 14.9944
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 4.560e+01 1.566e+00 29.119
## adult_mortality -3.429e-03 3.164e-03 -1.084
## under_five_deaths 9.951e-03 4.195e-03 2.372
## bmi 3.534e-01 3.022e-02 11.692
## diphtheria 1.465e-01 1.677e-02 8.733
## hiv_aids -1.186e+00 1.378e-01 -8.607
## gdp 4.622e-04 9.661e-05 4.784
## income_composition_of_resources -1.458e+01 2.907e+00 -5.016
## schooling 1.338e+00 1.875e-01 7.137
## adult_mortality:under_five_deaths -1.128e-06 5.387e-06 -0.209
## adult_mortality:bmi 3.349e-05 5.962e-05 0.562
## adult_mortality:diphtheria -9.966e-05 3.239e-05 -3.076
## adult_mortality:hiv_aids 8.373e-04 6.911e-05 12.115
## adult_mortality:gdp -5.788e-08 1.717e-07 -0.337
## adult_mortality:income_composition_of_resources -7.377e-03 7.182e-03 -1.027
## adult_mortality:schooling -8.181e-04 4.551e-04 -1.798
## under_five_deaths:bmi -3.054e-04 1.111e-04 -2.749
## under_five_deaths:diphtheria -1.373e-05 2.680e-05 -0.512
## under_five_deaths:hiv_aids -6.917e-04 3.445e-04 -2.008
## under_five_deaths:gdp -2.036e-07 5.638e-07 -0.361
## under_five_deaths:income_composition_of_resources 1.716e-02 6.180e-03 2.777
## under_five_deaths:schooling -1.339e-03 5.273e-04 -2.539
## bmi:diphtheria -1.487e-03 2.222e-04 -6.691
## bmi:hiv_aids 4.319e-03 1.999e-03 2.161
## bmi:gdp -2.412e-07 3.413e-07 -0.707
## bmi:income_composition_of_resources 1.020e-02 3.377e-02 0.302
## bmi:schooling -1.633e-02 2.264e-03 -7.214
## diphtheria:hiv_aids -9.062e-04 9.076e-04 -0.999
## diphtheria:gdp 2.030e-07 5.526e-07 0.367
## diphtheria:income_composition_of_resources 6.137e-02 2.334e-02 2.630
## diphtheria:schooling -6.664e-03 1.693e-03 -3.936
## hiv_aids:gdp -3.389e-05 9.961e-06 -3.402
## hiv_aids:income_composition_of_resources 2.017e+00 3.418e-01 5.901
## hiv_aids:schooling -5.147e-02 1.627e-02 -3.164
## gdp:income_composition_of_resources -4.159e-04 1.052e-04 -3.954
## gdp:schooling -3.872e-06 3.694e-06 -1.048
## income_composition_of_resources:schooling 1.435e+00 1.112e-01 12.909
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## adult_mortality 0.278493
## under_five_deaths 0.017768 *
## bmi < 2e-16 ***
## diphtheria < 2e-16 ***
## hiv_aids < 2e-16 ***
## gdp 1.81e-06 ***
## income_composition_of_resources 5.64e-07 ***
## schooling 1.23e-12 ***
## adult_mortality:under_five_deaths 0.834099
## adult_mortality:bmi 0.574404
## adult_mortality:diphtheria 0.002117 **
## adult_mortality:hiv_aids < 2e-16 ***
## adult_mortality:gdp 0.736077
## adult_mortality:income_composition_of_resources 0.304431
## adult_mortality:schooling 0.072369 .
## under_five_deaths:bmi 0.006011 **
## under_five_deaths:diphtheria 0.608422
## under_five_deaths:hiv_aids 0.044778 *
## under_five_deaths:gdp 0.718055
## under_five_deaths:income_composition_of_resources 0.005521 **
## under_five_deaths:schooling 0.011162 *
## bmi:diphtheria 2.70e-11 ***
## bmi:hiv_aids 0.030771 *
## bmi:gdp 0.479841
## bmi:income_composition_of_resources 0.762627
## bmi:schooling 7.12e-13 ***
## diphtheria:hiv_aids 0.318122
## diphtheria:gdp 0.713441
## diphtheria:income_composition_of_resources 0.008597 **
## diphtheria:schooling 8.49e-05 ***
## hiv_aids:gdp 0.000679 ***
## hiv_aids:income_composition_of_resources 4.08e-09 ***
## hiv_aids:schooling 0.001573 **
## gdp:income_composition_of_resources 7.88e-05 ***
## gdp:schooling 0.294661
## income_composition_of_resources:schooling < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.564 on 2598 degrees of freedom
## Multiple R-squared: 0.8615, Adjusted R-squared: 0.8596
## F-statistic: 448.9 on 36 and 2598 DF, p-value: < 2.2e-16
## [1] 3.362962
# aic_model_backward <- step(sig_linear_model, direction = "backward")
# summary(aic_model_backward)
# par(mfrow = c(2, 2))
# plot(aic_model_backward, col="darkorange", )
# calc_rmse(le_tst_data$life_expectancy,
# predict(aic_model_backward, newdata = le_tst_data))# aic_model_both <- step(sig_linear_model, direction = "both")
# summary(aic_model_both)
# par(mfrow = c(2, 2))
# plot(aic_model_both, col="darkorange", )
# calc_rmse(le_tst_data$life_expectancy,
# predict(aic_model_both, newdata = le_tst_data))# bic_model_both <- step(sig_linear_model, k = log(length(resid(sig_linear_model))), direction = "both")
# summary(bic_model_both)
# par(mfrow = c(2, 2))
# plot(bic_model_both, col="darkorange", )
# calc_rmse(le_tst_data$life_expectancy,
# predict(bic_model_both, newdata = le_tst_data))# sig_linear_model_sq <- lm(life_expectancy ^ 2 ~ adult_mortality + alcohol + percentage_expenditure +
# under_five_deaths + bmi + diphtheria + hiv_aids + gdp +
# income_composition_of_resources + schooling, data = non_cat_predictor_df)
# summary(sig_linear_model_sq)
# par(mfrow = c(2, 2))
# plot(sig_linear_model_sq, col="darkorange", )
# shapiro.test(resid(sig_linear_model_sq))
# calc_rmse(le_tst_data$life_expectancy,
# predict(sig_linear_model_sq, newdata = le_tst_data))# aic_model <- step(sig_linear_model, direction = "both", test = "F")
# par(mfrow = c(2, 2))
# plot(aic_model)lm_model <- lm(life_expectancy ~ income_composition_of_resources + adult_mortality +
bmi + status + under_five_deaths,
data = non_cat_predictor_df)
par(mfrow = c(2, 2))
plot(lm_model)## [1] 4.766326
lm_model_cube <- lm(life_expectancy ^ 3 ~ income_composition_of_resources + adult_mortality +
bmi + status + under_five_deaths,
data = non_cat_predictor_df)
summary(lm_model_cube)##
## Call:
## lm(formula = life_expectancy^3 ~ income_composition_of_resources +
## adult_mortality + bmi + status + under_five_deaths, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -264191 -33656 -3659 31225 340598
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 281961.262 7959.723 35.424 < 2e-16 ***
## income_composition_of_resources 235398.426 8317.241 28.302 < 2e-16 ***
## adult_mortality -388.140 12.330 -31.478 < 2e-16 ***
## bmi 989.076 79.225 12.484 < 2e-16 ***
## statusDeveloping -61380.933 3979.585 -15.424 < 2e-16 ***
## under_five_deaths -53.334 8.747 -6.097 1.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 67240 on 2629 degrees of freedom
## Multiple R-squared: 0.7294, Adjusted R-squared: 0.7289
## F-statistic: 1417 on 5 and 2629 DF, p-value: < 2.2e-16
## [1] 363332.2
model1 <- lm(life_expectancy ~ schooling + bmi + alcohol + gdp + hiv_aids + diphtheria + status, data = non_cat_predictor_df)
summary(model1)##
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + alcohol + gdp +
## hiv_aids + diphtheria + status, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.9508 -2.8516 -0.0173 2.8691 21.4125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.884e+01 5.877e-01 83.102 < 2e-16 ***
## schooling 1.275e+00 4.028e-02 31.649 < 2e-16 ***
## bmi 6.391e-02 5.458e-03 11.709 < 2e-16 ***
## alcohol -4.988e-02 2.959e-02 -1.686 0.092 .
## gdp 6.939e-05 7.751e-06 8.953 < 2e-16 ***
## hiv_aids -6.794e-01 1.816e-02 -37.403 < 2e-16 ***
## diphtheria 6.523e-02 4.186e-03 15.585 < 2e-16 ***
## statusDeveloping -2.146e+00 3.201e-01 -6.703 2.49e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.559 on 2627 degrees of freedom
## Multiple R-squared: 0.7708, Adjusted R-squared: 0.7702
## F-statistic: 1262 on 7 and 2627 DF, p-value: < 2.2e-16
## [1] 4.39782
## Start: AIC=8003.26
## life_expectancy ~ schooling + bmi + alcohol + gdp + hiv_aids +
## diphtheria + status
##
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 54604 8003.3
## - alcohol 1 59.1 54663 8004.1 2.8412 0.092 .
## - status 1 933.9 55538 8045.9 44.9326 2.487e-11 ***
## - gdp 1 1666.1 56270 8080.5 80.1564 < 2.2e-16 ***
## - bmi 1 2849.8 57453 8135.3 137.1063 < 2.2e-16 ***
## - diphtheria 1 5048.8 59652 8234.3 242.9016 < 2.2e-16 ***
## - schooling 1 20820.0 75424 8852.4 1001.6563 < 2.2e-16 ***
## - hiv_aids 1 29079.2 83683 9126.2 1399.0111 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + alcohol + gdp +
## hiv_aids + diphtheria + status, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.9508 -2.8516 -0.0173 2.8691 21.4125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.884e+01 5.877e-01 83.102 < 2e-16 ***
## schooling 1.275e+00 4.028e-02 31.649 < 2e-16 ***
## bmi 6.391e-02 5.458e-03 11.709 < 2e-16 ***
## alcohol -4.988e-02 2.959e-02 -1.686 0.092 .
## gdp 6.939e-05 7.751e-06 8.953 < 2e-16 ***
## hiv_aids -6.794e-01 1.816e-02 -37.403 < 2e-16 ***
## diphtheria 6.523e-02 4.186e-03 15.585 < 2e-16 ***
## statusDeveloping -2.146e+00 3.201e-01 -6.703 2.49e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.559 on 2627 degrees of freedom
## Multiple R-squared: 0.7708, Adjusted R-squared: 0.7702
## F-statistic: 1262 on 7 and 2627 DF, p-value: < 2.2e-16
## [1] 4.39782
model1 <- lm(life_expectancy ~ schooling + bmi + gdp * status + hiv_aids + diphtheria, data = non_cat_predictor_df)
summary(model1)##
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + gdp * status +
## hiv_aids + diphtheria, data = non_cat_predictor_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.9932 -2.9250 0.0812 2.9494 20.7958
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.972e+01 6.390e-01 77.815 < 2e-16 ***
## schooling 1.234e+00 3.892e-02 31.715 < 2e-16 ***
## bmi 6.029e-02 5.495e-03 10.972 < 2e-16 ***
## gdp 4.706e-05 9.683e-06 4.860 1.24e-06 ***
## statusDeveloping -2.742e+00 3.593e-01 -7.631 3.23e-14 ***
## hiv_aids -6.809e-01 1.797e-02 -37.887 < 2e-16 ***
## diphtheria 6.451e-02 4.181e-03 15.430 < 2e-16 ***
## gdp:statusDeveloping 6.133e-05 1.588e-05 3.862 0.000115 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.549 on 2627 degrees of freedom
## Multiple R-squared: 0.7718, Adjusted R-squared: 0.7712
## F-statistic: 1269 on 7 and 2627 DF, p-value: < 2.2e-16
## [1] 4.370292
model1 <- lm(life_expectancy ~ (schooling + bmi + gdp + hiv_aids + diphtheria) * status, data = le_trn_data)
summary(model1)##
## Call:
## lm(formula = life_expectancy ~ (schooling + bmi + gdp + hiv_aids +
## diphtheria) * status, data = le_trn_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.2444 -2.7806 0.0821 2.7248 21.0926
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.400e+01 2.818e+00 22.708 < 2e-16 ***
## schooling 8.323e-01 1.284e-01 6.483 1.07e-10 ***
## bmi -1.775e-02 1.238e-02 -1.434 0.15155
## gdp 5.041e-05 9.706e-06 5.193 2.23e-07 ***
## hiv_aids -6.709e-01 1.778e-02 -37.737 < 2e-16 ***
## diphtheria 2.186e-02 1.729e-02 1.264 0.20636
## statusDeveloping -1.745e+01 2.852e+00 -6.118 1.09e-09 ***
## schooling:statusDeveloping 3.814e-01 1.349e-01 2.828 0.00472 **
## bmi:statusDeveloping 9.622e-02 1.378e-02 6.984 3.63e-12 ***
## gdp:statusDeveloping 4.708e-05 1.590e-05 2.962 0.00309 **
## hiv_aids:statusDeveloping NA NA NA NA
## diphtheria:statusDeveloping 4.321e-02 1.782e-02 2.425 0.01539 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.489 on 2624 degrees of freedom
## Multiple R-squared: 0.778, Adjusted R-squared: 0.7772
## F-statistic: 919.6 on 10 and 2624 DF, p-value: < 2.2e-16
## Start: AIC=7924.86
## life_expectancy ~ (schooling + bmi + gdp + hiv_aids + diphtheria) *
## status
##
##
## Step: AIC=7924.86
## life_expectancy ~ schooling + bmi + gdp + hiv_aids + diphtheria +
## status + schooling:status + bmi:status + gdp:status + diphtheria:status
##
## Df Sum of Sq RSS AIC
## <none> 52882 7924.9
## - diphtheria:status 1 118.5 53001 7928.8
## - schooling:status 1 161.2 53044 7930.9
## - gdp:status 1 176.8 53059 7931.6
## - bmi:status 1 982.9 53865 7971.4
## - hiv_aids 1 28699.7 81582 9065.2
##
## Call:
## lm(formula = life_expectancy ~ schooling + bmi + gdp + hiv_aids +
## diphtheria + status + schooling:status + bmi:status + gdp:status +
## diphtheria:status, data = le_trn_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.2444 -2.7806 0.0821 2.7248 21.0926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.400e+01 2.818e+00 22.708 < 2e-16 ***
## schooling 8.323e-01 1.284e-01 6.483 1.07e-10 ***
## bmi -1.775e-02 1.238e-02 -1.434 0.15155
## gdp 5.041e-05 9.706e-06 5.193 2.23e-07 ***
## hiv_aids -6.709e-01 1.778e-02 -37.737 < 2e-16 ***
## diphtheria 2.186e-02 1.729e-02 1.264 0.20636
## statusDeveloping -1.745e+01 2.852e+00 -6.118 1.09e-09 ***
## schooling:statusDeveloping 3.814e-01 1.349e-01 2.828 0.00472 **
## bmi:statusDeveloping 9.622e-02 1.378e-02 6.984 3.63e-12 ***
## gdp:statusDeveloping 4.708e-05 1.590e-05 2.962 0.00309 **
## diphtheria:statusDeveloping 4.321e-02 1.782e-02 2.425 0.01539 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.489 on 2624 degrees of freedom
## Multiple R-squared: 0.778, Adjusted R-squared: 0.7772
## F-statistic: 919.6 on 10 and 2624 DF, p-value: < 2.2e-16
## Warning: package 'leaps' was built under R version 4.0.2
regs <- regsubsets(life_expectancy ~ ., data = non_cat_predictor_df, nbest=10)
plot(regs,
scale="adjr",
main="All possible regression: ranked by Adjusted R-squared")model1 <- lm(life_expectancy ~ adult_mortality + bmi + hiv_aids +
income_composition_of_resources + schooling, data = le_trn_data)
summary(model1)##
## Call:
## lm(formula = life_expectancy ~ adult_mortality + bmi + hiv_aids +
## income_composition_of_resources + schooling, data = le_trn_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.5054 -2.1955 -0.1338 2.2222 23.3580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.8491547 0.4361749 123.458 <2e-16 ***
## adult_mortality -0.0201389 0.0008887 -22.662 <2e-16 ***
## bmi 0.0504595 0.0052085 9.688 <2e-16 ***
## hiv_aids -0.4867176 0.0191328 -25.439 <2e-16 ***
## income_composition_of_resources 9.0596858 0.6923758 13.085 <2e-16 ***
## schooling 0.9977424 0.0451093 22.118 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.322 on 2629 degrees of freedom
## Multiple R-squared: 0.7938, Adjusted R-squared: 0.7934
## F-statistic: 2024 on 5 and 2629 DF, p-value: < 2.2e-16
outliers_out <- boxplot(le_trn_data$life_expectancy, plot = F)$out # untuk mendaptkan outlier
life_clean <- le_trn_data[-which(le_trn_data$life_expectancy %in% outliers_out), ]
nrow(life_clean)## [1] 2624
model1 <- lm(life_expectancy ~ adult_mortality + bmi + hiv_aids +
income_composition_of_resources + schooling, data = life_clean)
summary(model1)##
## Call:
## lm(formula = life_expectancy ~ adult_mortality + bmi + hiv_aids +
## income_composition_of_resources + schooling, data = life_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.1504 -2.2166 -0.1487 2.2184 23.1675
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.9791795 0.4307309 125.320 <2e-16 ***
## adult_mortality -0.0197288 0.0008879 -22.219 <2e-16 ***
## bmi 0.0504382 0.0051161 9.859 <2e-16 ***
## hiv_aids -0.4920254 0.0190763 -25.792 <2e-16 ***
## income_composition_of_resources 8.8965947 0.6799516 13.084 <2e-16 ***
## schooling 0.9944462 0.0442962 22.450 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.242 on 2618 degrees of freedom
## Multiple R-squared: 0.7952, Adjusted R-squared: 0.7948
## F-statistic: 2033 on 5 and 2618 DF, p-value: < 2.2e-16
sig_linear_model <- lm(life_expectancy ~ adult_mortality +
under_five_deaths + bmi + diphtheria + hiv_aids + gdp +
income_composition_of_resources + schooling + status,
data = life_clean)
summary(sig_linear_model)##
## Call:
## lm(formula = life_expectancy ~ adult_mortality + under_five_deaths +
## bmi + diphtheria + hiv_aids + gdp + income_composition_of_resources +
## schooling + status, data = life_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.0050 -2.3594 -0.1452 2.3826 18.8654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.434e+01 5.799e-01 93.714 < 2e-16 ***
## adult_mortality -1.819e-02 8.404e-04 -21.643 < 2e-16 ***
## under_five_deaths -2.498e-03 5.253e-04 -4.754 2.10e-06 ***
## bmi 4.283e-02 4.863e-03 8.807 < 2e-16 ***
## diphtheria 5.252e-02 3.717e-03 14.131 < 2e-16 ***
## hiv_aids -4.950e-01 1.794e-02 -27.587 < 2e-16 ***
## gdp 4.778e-05 6.855e-06 6.970 4.00e-12 ***
## income_composition_of_resources 6.912e+00 6.522e-01 10.598 < 2e-16 ***
## schooling 7.872e-01 4.347e-02 18.109 < 2e-16 ***
## statusDeveloping -1.412e+00 2.538e-01 -5.563 2.92e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.987 on 2614 degrees of freedom
## Multiple R-squared: 0.8193, Adjusted R-squared: 0.8187
## F-statistic: 1317 on 9 and 2614 DF, p-value: < 2.2e-16
## Start: AIC=7118.86
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + hepatitis_b + measles +
## bmi + under_five_deaths + polio + total_expenditure + diphtheria +
## hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years +
## income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## - thinness_5_9_years 1 0.2 38956 7116.9
## - hepatitis_b 1 0.5 38957 7116.9
## - population 1 2.6 38959 7117.0
## - alcohol 1 26.2 38982 7118.6
## <none> 38956 7118.9
## - thinness_1_19_years 1 34.7 38991 7119.2
## - total_expenditure 1 39.9 38996 7119.5
## - measles 1 60.9 39017 7121.0
## - percentage_expenditure 1 66.0 39022 7121.3
## - gdp 1 97.6 39054 7123.4
## - status 1 283.0 39239 7135.9
## - polio 1 432.3 39388 7145.8
## - diphtheria 1 625.6 39582 7158.7
## - bmi 1 775.8 39732 7168.6
## - income_composition_of_resources 1 1407.5 40364 7210.0
## - infant_deaths 1 1640.7 40597 7225.1
## - under_five_deaths 1 1708.9 40665 7229.5
## - schooling 1 4021.5 42978 7374.6
## - adult_mortality 1 6958.3 45914 7548.1
## - hiv_aids 1 11167.2 50123 7778.2
##
## Step: AIC=7116.87
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + hepatitis_b + measles +
## bmi + under_five_deaths + polio + total_expenditure + diphtheria +
## hiv_aids + gdp + population + thinness_1_19_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - hepatitis_b 1 0.5 38957 7114.9
## - population 1 2.6 38959 7115.0
## - alcohol 1 26.3 38983 7116.6
## <none> 38956 7116.9
## - total_expenditure 1 40.3 38997 7117.6
## + thinness_5_9_years 1 0.2 38956 7118.9
## - measles 1 60.8 39017 7119.0
## - percentage_expenditure 1 66.0 39022 7119.3
## - gdp 1 97.7 39054 7121.4
## - thinness_1_19_years 1 158.6 39115 7125.5
## - status 1 283.4 39240 7133.9
## - polio 1 432.8 39389 7143.9
## - diphtheria 1 625.5 39582 7156.7
## - bmi 1 790.2 39746 7167.6
## - income_composition_of_resources 1 1407.3 40364 7208.0
## - infant_deaths 1 1644.5 40601 7223.4
## - under_five_deaths 1 1710.8 40667 7227.6
## - schooling 1 4022.3 42979 7372.7
## - adult_mortality 1 6968.0 45924 7546.7
## - hiv_aids 1 11181.9 50138 7777.0
##
## Step: AIC=7114.9
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## population + thinness_1_19_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - population 1 2.7 38959 7113.1
## - alcohol 1 27.0 38984 7114.7
## <none> 38957 7114.9
## - total_expenditure 1 40.1 38997 7115.6
## + hepatitis_b 1 0.5 38956 7116.9
## + thinness_5_9_years 1 0.2 38957 7116.9
## - measles 1 60.7 39018 7117.0
## - percentage_expenditure 1 67.4 39024 7117.4
## - gdp 1 97.3 39054 7119.4
## - thinness_1_19_years 1 160.3 39117 7123.7
## - status 1 283.0 39240 7131.9
## - polio 1 439.5 39396 7142.3
## - diphtheria 1 722.4 39679 7161.1
## - bmi 1 789.9 39747 7165.6
## - income_composition_of_resources 1 1410.0 40367 7206.2
## - infant_deaths 1 1651.6 40608 7221.9
## - under_five_deaths 1 1716.1 40673 7226.0
## - schooling 1 4036.0 42993 7371.6
## - adult_mortality 1 6969.2 45926 7544.8
## - hiv_aids 1 11193.2 50150 7775.6
##
## Step: AIC=7113.09
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## - alcohol 1 27.2 38987 7112.9
## <none> 38959 7113.1
## - total_expenditure 1 39.7 38999 7113.8
## + population 1 2.7 38957 7114.9
## + hepatitis_b 1 0.6 38959 7115.0
## + thinness_5_9_years 1 0.2 38959 7115.1
## - measles 1 63.5 39023 7115.4
## - percentage_expenditure 1 67.3 39027 7115.6
## - gdp 1 97.5 39057 7117.6
## - thinness_1_19_years 1 160.6 39120 7121.9
## - status 1 281.8 39241 7130.0
## - polio 1 438.6 39398 7140.5
## - diphtheria 1 725.7 39685 7159.5
## - bmi 1 791.4 39751 7163.9
## - income_composition_of_resources 1 1409.6 40369 7204.3
## - infant_deaths 1 1713.3 40673 7224.0
## - under_five_deaths 1 1744.0 40703 7226.0
## - schooling 1 4048.6 43008 7370.5
## - adult_mortality 1 6976.5 45936 7543.3
## - hiv_aids 1 11193.5 50153 7773.8
##
## Step: AIC=7112.92
## life_expectancy ~ status + adult_mortality + infant_deaths +
## percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## <none> 38987 7112.9
## + alcohol 1 27.2 38959 7113.1
## - total_expenditure 1 45.6 39032 7114.0
## + population 1 3.0 38984 7114.7
## + hepatitis_b 1 1.3 38985 7114.8
## + thinness_5_9_years 1 0.4 38986 7114.9
## - measles 1 61.9 39049 7115.1
## - percentage_expenditure 1 71.4 39058 7115.7
## - gdp 1 93.2 39080 7117.2
## - thinness_1_19_years 1 197.5 39184 7124.2
## - status 1 416.7 39403 7138.8
## - polio 1 443.1 39430 7140.6
## - diphtheria 1 729.0 39716 7159.5
## - bmi 1 790.0 39777 7163.6
## - income_composition_of_resources 1 1415.8 40403 7204.5
## - infant_deaths 1 1686.2 40673 7222.0
## - under_five_deaths 1 1717.2 40704 7224.0
## - schooling 1 4406.1 43393 7391.9
## - adult_mortality 1 6954.9 45942 7541.7
## - hiv_aids 1 11179.4 50166 7772.5
##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling,
## data = subset(life_clean, select = -c(year, country, continent,
## region)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.5984 -2.3398 -0.1235 2.2619 17.5405
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.524e+01 6.431e-01 85.891 < 2e-16 ***
## statusDeveloping -1.346e+00 2.550e-01 -5.280 1.40e-07 ***
## adult_mortality -1.766e-02 8.189e-04 -21.570 < 2e-16 ***
## infant_deaths 8.870e-02 8.351e-03 10.621 < 2e-16 ***
## percentage_expenditure 1.649e-04 7.544e-05 2.186 0.028931 *
## measles -1.548e-05 7.605e-06 -2.036 0.041890 *
## bmi 3.646e-02 5.016e-03 7.270 4.74e-13 ***
## under_five_deaths -6.602e-02 6.160e-03 -10.718 < 2e-16 ***
## polio 2.482e-02 4.559e-03 5.445 5.68e-08 ***
## total_expenditure 5.908e-02 3.383e-02 1.746 0.080870 .
## diphtheria 3.168e-02 4.536e-03 6.983 3.64e-12 ***
## hiv_aids -4.802e-01 1.756e-02 -27.347 < 2e-16 ***
## gdp 2.909e-05 1.165e-05 2.496 0.012609 *
## thinness_1_19_years -8.538e-02 2.349e-02 -3.635 0.000283 ***
## income_composition_of_resources 6.227e+00 6.399e-01 9.732 < 2e-16 ***
## schooling 7.365e-01 4.290e-02 17.168 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.866 on 2608 degrees of freedom
## Multiple R-squared: 0.8305, Adjusted R-squared: 0.8295
## F-statistic: 851.7 on 15 and 2608 DF, p-value: < 2.2e-16
## Start: AIC=7118.86
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + hepatitis_b + measles +
## bmi + under_five_deaths + polio + total_expenditure + diphtheria +
## hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years +
## income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## - thinness_5_9_years 1 0.2 38956 7116.9
## - hepatitis_b 1 0.5 38957 7116.9
## - population 1 2.6 38959 7117.0
## - alcohol 1 26.2 38982 7118.6
## <none> 38956 7118.9
## - thinness_1_19_years 1 34.7 38991 7119.2
## - total_expenditure 1 39.9 38996 7119.5
## - measles 1 60.9 39017 7121.0
## - percentage_expenditure 1 66.0 39022 7121.3
## - gdp 1 97.6 39054 7123.4
## - status 1 283.0 39239 7135.9
## - polio 1 432.3 39388 7145.8
## - diphtheria 1 625.6 39582 7158.7
## - bmi 1 775.8 39732 7168.6
## - income_composition_of_resources 1 1407.5 40364 7210.0
## - infant_deaths 1 1640.7 40597 7225.1
## - under_five_deaths 1 1708.9 40665 7229.5
## - schooling 1 4021.5 42978 7374.6
## - adult_mortality 1 6958.3 45914 7548.1
## - hiv_aids 1 11167.2 50123 7778.2
##
## Step: AIC=7116.87
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + hepatitis_b + measles +
## bmi + under_five_deaths + polio + total_expenditure + diphtheria +
## hiv_aids + gdp + population + thinness_1_19_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - hepatitis_b 1 0.5 38957 7114.9
## - population 1 2.6 38959 7115.0
## - alcohol 1 26.3 38983 7116.6
## <none> 38956 7116.9
## - total_expenditure 1 40.3 38997 7117.6
## - measles 1 60.8 39017 7119.0
## - percentage_expenditure 1 66.0 39022 7119.3
## - gdp 1 97.7 39054 7121.4
## - thinness_1_19_years 1 158.6 39115 7125.5
## - status 1 283.4 39240 7133.9
## - polio 1 432.8 39389 7143.9
## - diphtheria 1 625.5 39582 7156.7
## - bmi 1 790.2 39746 7167.6
## - income_composition_of_resources 1 1407.3 40364 7208.0
## - infant_deaths 1 1644.5 40601 7223.4
## - under_five_deaths 1 1710.8 40667 7227.6
## - schooling 1 4022.3 42979 7372.7
## - adult_mortality 1 6968.0 45924 7546.7
## - hiv_aids 1 11181.9 50138 7777.0
##
## Step: AIC=7114.9
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## population + thinness_1_19_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - population 1 2.7 38959 7113.1
## - alcohol 1 27.0 38984 7114.7
## <none> 38957 7114.9
## - total_expenditure 1 40.1 38997 7115.6
## - measles 1 60.7 39018 7117.0
## - percentage_expenditure 1 67.4 39024 7117.4
## - gdp 1 97.3 39054 7119.4
## - thinness_1_19_years 1 160.3 39117 7123.7
## - status 1 283.0 39240 7131.9
## - polio 1 439.5 39396 7142.3
## - diphtheria 1 722.4 39679 7161.1
## - bmi 1 789.9 39747 7165.6
## - income_composition_of_resources 1 1410.0 40367 7206.2
## - infant_deaths 1 1651.6 40608 7221.9
## - under_five_deaths 1 1716.1 40673 7226.0
## - schooling 1 4036.0 42993 7371.6
## - adult_mortality 1 6969.2 45926 7544.8
## - hiv_aids 1 11193.2 50150 7775.6
##
## Step: AIC=7113.09
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## - alcohol 1 27.2 38987 7112.9
## <none> 38959 7113.1
## - total_expenditure 1 39.7 38999 7113.8
## - measles 1 63.5 39023 7115.4
## - percentage_expenditure 1 67.3 39027 7115.6
## - gdp 1 97.5 39057 7117.6
## - thinness_1_19_years 1 160.6 39120 7121.9
## - status 1 281.8 39241 7130.0
## - polio 1 438.6 39398 7140.5
## - diphtheria 1 725.7 39685 7159.5
## - bmi 1 791.4 39751 7163.9
## - income_composition_of_resources 1 1409.6 40369 7204.3
## - infant_deaths 1 1713.3 40673 7224.0
## - under_five_deaths 1 1744.0 40703 7226.0
## - schooling 1 4048.6 43008 7370.5
## - adult_mortality 1 6976.5 45936 7543.3
## - hiv_aids 1 11193.5 50153 7773.8
##
## Step: AIC=7112.92
## life_expectancy ~ status + adult_mortality + infant_deaths +
## percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## <none> 38987 7112.9
## - total_expenditure 1 45.6 39032 7114.0
## - measles 1 61.9 39049 7115.1
## - percentage_expenditure 1 71.4 39058 7115.7
## - gdp 1 93.2 39080 7117.2
## - thinness_1_19_years 1 197.5 39184 7124.2
## - status 1 416.7 39403 7138.8
## - polio 1 443.1 39430 7140.6
## - diphtheria 1 729.0 39716 7159.5
## - bmi 1 790.0 39777 7163.6
## - income_composition_of_resources 1 1415.8 40403 7204.5
## - infant_deaths 1 1686.2 40673 7222.0
## - under_five_deaths 1 1717.2 40704 7224.0
## - schooling 1 4406.1 43393 7391.9
## - adult_mortality 1 6954.9 45942 7541.7
## - hiv_aids 1 11179.4 50166 7772.5
##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling,
## data = subset(life_clean, select = -c(year, country, continent,
## region)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.5984 -2.3398 -0.1235 2.2619 17.5405
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.524e+01 6.431e-01 85.891 < 2e-16 ***
## statusDeveloping -1.346e+00 2.550e-01 -5.280 1.40e-07 ***
## adult_mortality -1.766e-02 8.189e-04 -21.570 < 2e-16 ***
## infant_deaths 8.870e-02 8.351e-03 10.621 < 2e-16 ***
## percentage_expenditure 1.649e-04 7.544e-05 2.186 0.028931 *
## measles -1.548e-05 7.605e-06 -2.036 0.041890 *
## bmi 3.646e-02 5.016e-03 7.270 4.74e-13 ***
## under_five_deaths -6.602e-02 6.160e-03 -10.718 < 2e-16 ***
## polio 2.482e-02 4.559e-03 5.445 5.68e-08 ***
## total_expenditure 5.908e-02 3.383e-02 1.746 0.080870 .
## diphtheria 3.168e-02 4.536e-03 6.983 3.64e-12 ***
## hiv_aids -4.802e-01 1.756e-02 -27.347 < 2e-16 ***
## gdp 2.909e-05 1.165e-05 2.496 0.012609 *
## thinness_1_19_years -8.538e-02 2.349e-02 -3.635 0.000283 ***
## income_composition_of_resources 6.227e+00 6.399e-01 9.732 < 2e-16 ***
## schooling 7.365e-01 4.290e-02 17.168 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.866 on 2608 degrees of freedom
## Multiple R-squared: 0.8305, Adjusted R-squared: 0.8295
## F-statistic: 851.7 on 15 and 2608 DF, p-value: < 2.2e-16
clean_bic <- lm(life_expectancy ~ status + adult_mortality + infant_deaths +
+ bmi + under_five_deaths +
polio + diphtheria + hiv_aids + income_composition_of_resources + schooling,
data = life_clean)
summary(clean_bic)##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## +bmi + under_five_deaths + polio + diphtheria + hiv_aids +
## income_composition_of_resources + schooling, data = life_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.7994 -2.2965 -0.1041 2.2529 18.5435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.0840506 0.5905404 93.277 < 2e-16 ***
## statusDeveloping -2.1741396 0.2398023 -9.066 < 2e-16 ***
## adult_mortality -0.0179869 0.0008269 -21.752 < 2e-16 ***
## infant_deaths 0.0846174 0.0084069 10.065 < 2e-16 ***
## bmi 0.0446784 0.0047838 9.339 < 2e-16 ***
## under_five_deaths -0.0644595 0.0061863 -10.420 < 2e-16 ***
## polio 0.0248476 0.0046206 5.378 8.22e-08 ***
## diphtheria 0.0314429 0.0045996 6.836 1.01e-11 ***
## hiv_aids -0.4799068 0.0176931 -27.124 < 2e-16 ***
## income_composition_of_resources 6.8176464 0.6401068 10.651 < 2e-16 ***
## schooling 0.7794565 0.0429254 18.158 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.924 on 2613 degrees of freedom
## Multiple R-squared: 0.825, Adjusted R-squared: 0.8244
## F-statistic: 1232 on 10 and 2613 DF, p-value: < 2.2e-16
## [1] 2525
clean_bic <- lm(life_expectancy ~ status + adult_mortality + infant_deaths +
+ bmi + under_five_deaths +
polio + diphtheria + hiv_aids + income_composition_of_resources + schooling,
data = life_clean)
summary(clean_bic)##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## +bmi + under_five_deaths + polio + diphtheria + hiv_aids +
## income_composition_of_resources + schooling, data = life_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.3121 -2.2247 -0.1143 2.1593 17.4907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.2772489 0.6003860 95.401 < 2e-16 ***
## statusDeveloping -2.2759834 0.2309374 -9.855 < 2e-16 ***
## adult_mortality -0.0194412 0.0009306 -20.891 < 2e-16 ***
## infant_deaths 0.0693023 0.0098036 7.069 2.01e-12 ***
## bmi 0.0389316 0.0046108 8.444 < 2e-16 ***
## under_five_deaths -0.0533229 0.0072975 -7.307 3.65e-13 ***
## polio 0.0220567 0.0045025 4.899 1.03e-06 ***
## diphtheria 0.0261304 0.0044883 5.822 6.56e-09 ***
## hiv_aids -0.5626863 0.0278559 -20.200 < 2e-16 ***
## income_composition_of_resources 6.4577600 0.6205748 10.406 < 2e-16 ***
## schooling 0.7347332 0.0419489 17.515 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.757 on 2514 degrees of freedom
## Multiple R-squared: 0.8036, Adjusted R-squared: 0.8028
## F-statistic: 1029 on 10 and 2514 DF, p-value: < 2.2e-16
life_clean1 <- subset(life_clean, select = -c(year, country, continent, region))
clean_full <- lm(formula = life_expectancy ~ ., data = life_clean1)
clean_none <- lm(formula = life_expectancy ~ 1, data = life_clean1)## Start: AIC=6602.45
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + hepatitis_b + measles +
## bmi + under_five_deaths + polio + total_expenditure + diphtheria +
## hiv_aids + gdp + population + thinness_1_19_years + thinness_5_9_years +
## income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## - hepatitis_b 1 0.3 33962 6600.5
## - population 1 1.6 33963 6600.6
## - thinness_5_9_years 1 3.1 33965 6600.7
## <none> 33962 6602.4
## - thinness_1_19_years 1 39.9 34001 6603.4
## - total_expenditure 1 59.0 34021 6604.8
## - alcohol 1 63.0 34025 6605.1
## - percentage_expenditure 1 71.7 34033 6605.8
## - measles 1 77.2 34039 6606.2
## - gdp 1 97.1 34059 6607.7
## - status 1 266.3 34228 6620.2
## - polio 1 316.2 34278 6623.8
## - diphtheria 1 376.3 34338 6628.3
## - bmi 1 458.3 34420 6634.3
## - infant_deaths 1 945.7 34907 6669.8
## - under_five_deaths 1 979.2 34941 6672.2
## - income_composition_of_resources 1 1254.7 35216 6692.0
## - schooling 1 3146.9 37108 6824.2
## - hiv_aids 1 5616.3 39578 6986.9
## - adult_mortality 1 5805.6 39767 6998.9
##
## Step: AIC=6600.47
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## population + thinness_1_19_years + thinness_5_9_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - population 1 1.6 33963 6598.6
## - thinness_5_9_years 1 3.1 33965 6598.7
## <none> 33962 6600.5
## - thinness_1_19_years 1 39.7 34002 6601.4
## - total_expenditure 1 59.3 34021 6602.9
## - alcohol 1 62.7 34025 6603.1
## - percentage_expenditure 1 71.4 34033 6603.8
## - measles 1 77.1 34039 6604.2
## - gdp 1 97.6 34060 6605.7
## - status 1 267.0 34229 6618.2
## - polio 1 327.5 34289 6622.7
## - bmi 1 458.7 34421 6632.3
## - diphtheria 1 462.1 34424 6632.6
## - infant_deaths 1 945.7 34908 6667.8
## - under_five_deaths 1 979.0 34941 6670.2
## - income_composition_of_resources 1 1254.6 35217 6690.1
## - schooling 1 3170.3 37132 6823.8
## - hiv_aids 1 5618.1 39580 6985.0
## - adult_mortality 1 5805.4 39767 6996.9
##
## Step: AIC=6598.59
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + thinness_5_9_years + income_composition_of_resources +
## schooling
##
## Df Sum of Sq RSS AIC
## - thinness_5_9_years 1 3.3 33967 6596.8
## <none> 33963 6598.6
## - thinness_1_19_years 1 39.3 34003 6599.5
## - total_expenditure 1 58.8 34022 6601.0
## - alcohol 1 63.0 34026 6601.3
## - percentage_expenditure 1 71.3 34035 6601.9
## - measles 1 80.0 34043 6602.5
## - gdp 1 97.8 34061 6603.8
## - status 1 266.1 34230 6616.3
## - polio 1 326.9 34290 6620.8
## - bmi 1 459.2 34423 6630.5
## - diphtheria 1 464.0 34427 6630.8
## - infant_deaths 1 972.5 34936 6667.9
## - under_five_deaths 1 991.5 34955 6669.2
## - income_composition_of_resources 1 1254.5 35218 6688.2
## - schooling 1 3179.1 37143 6822.5
## - hiv_aids 1 5617.8 39581 6983.1
## - adult_mortality 1 5809.1 39773 6995.3
##
## Step: AIC=6596.83
## life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling
##
## Df Sum of Sq RSS AIC
## <none> 33967 6596.8
## - total_expenditure 1 60.3 34027 6599.3
## - alcohol 1 64.2 34031 6599.6
## - percentage_expenditure 1 71.4 34038 6600.1
## - measles 1 78.5 34045 6600.7
## - gdp 1 98.2 34065 6602.1
## - thinness_1_19_years 1 245.9 34213 6613.1
## - status 1 267.0 34234 6614.6
## - polio 1 328.3 34295 6619.1
## - diphtheria 1 462.5 34429 6629.0
## - bmi 1 476.6 34443 6630.0
## - infant_deaths 1 969.4 34936 6665.9
## - under_five_deaths 1 988.2 34955 6667.2
## - income_composition_of_resources 1 1253.2 35220 6686.3
## - schooling 1 3177.0 37144 6820.6
## - hiv_aids 1 5650.2 39617 6983.4
## - adult_mortality 1 5825.7 39792 6994.5
##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## alcohol + percentage_expenditure + measles + bmi + under_five_deaths +
## polio + total_expenditure + diphtheria + hiv_aids + gdp +
## thinness_1_19_years + income_composition_of_resources + schooling,
## data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.4795 -2.2931 -0.1227 2.1553 15.8087
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.753e+01 6.560e-01 87.700 < 2e-16 ***
## statusDeveloping -1.176e+00 2.650e-01 -4.440 9.38e-06 ***
## adult_mortality -1.905e-02 9.186e-04 -20.740 < 2e-16 ***
## infant_deaths 8.351e-02 9.871e-03 8.460 < 2e-16 ***
## alcohol 5.539e-02 2.545e-02 2.176 0.02962 *
## percentage_expenditure 1.651e-04 7.191e-05 2.296 0.02178 *
## measles -2.092e-05 8.690e-06 -2.407 0.01615 *
## bmi 2.858e-02 4.817e-03 5.932 3.40e-09 ***
## under_five_deaths -6.225e-02 7.287e-03 -8.542 < 2e-16 ***
## polio 2.176e-02 4.419e-03 4.923 9.06e-07 ***
## total_expenditure 6.932e-02 3.285e-02 2.110 0.03494 *
## diphtheria 2.573e-02 4.403e-03 5.844 5.76e-09 ***
## hiv_aids -5.670e-01 2.776e-02 -20.425 < 2e-16 ***
## gdp 2.992e-05 1.111e-05 2.693 0.00713 **
## thinness_1_19_years -1.003e-01 2.353e-02 -4.261 2.11e-05 ***
## income_composition_of_resources 5.929e+00 6.163e-01 9.619 < 2e-16 ***
## schooling 6.548e-01 4.275e-02 15.316 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.68 on 2508 degrees of freedom
## Multiple R-squared: 0.812, Adjusted R-squared: 0.8108
## F-statistic: 677 on 16 and 2508 DF, p-value: < 2.2e-16
small_cleaner_backward <- lm(life_expectancy ~ status + adult_mortality + infant_deaths + bmi + under_five_deaths +
total_expenditure + diphtheria + hiv_aids + gdp +
thinness_1_19_years + income_composition_of_resources + schooling,
data = life_clean1)
summary(small_cleaner_backward)##
## Call:
## lm(formula = life_expectancy ~ status + adult_mortality + infant_deaths +
## bmi + under_five_deaths + total_expenditure + diphtheria +
## hiv_aids + gdp + thinness_1_19_years + income_composition_of_resources +
## schooling, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.4482 -2.2757 -0.1305 2.1693 16.6982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.813e+01 6.432e-01 90.373 < 2e-16 ***
## statusDeveloping -1.469e+00 2.441e-01 -6.015 2.06e-09 ***
## adult_mortality -1.893e-02 9.212e-04 -20.545 < 2e-16 ***
## infant_deaths 7.881e-02 9.713e-03 8.114 7.55e-16 ***
## bmi 2.921e-02 4.824e-03 6.055 1.61e-09 ***
## under_five_deaths -5.938e-02 7.214e-03 -8.231 2.95e-16 ***
## total_expenditure 8.162e-02 3.293e-02 2.478 0.0133 *
## diphtheria 3.854e-02 3.612e-03 10.670 < 2e-16 ***
## hiv_aids -5.681e-01 2.771e-02 -20.501 < 2e-16 ***
## gdp 5.136e-05 6.387e-06 8.041 1.36e-15 ***
## thinness_1_19_years -1.070e-01 2.302e-02 -4.646 3.56e-06 ***
## income_composition_of_resources 5.856e+00 6.201e-01 9.444 < 2e-16 ***
## schooling 6.996e-01 4.184e-02 16.720 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.707 on 2512 degrees of freedom
## Multiple R-squared: 0.809, Adjusted R-squared: 0.8081
## F-statistic: 886.5 on 12 and 2512 DF, p-value: < 2.2e-16
## [1] 3.835798
# pairs(subset(life_clean1, select = c(life_expectancy, adult_mortality, infant_deaths, bmi,
# under_five_deaths, total_expenditure, diphtheria, hiv_aids,
# gdp, thinness_1_19_years, income_composition_of_resources,
# schooling)), col = "deepskyblue")small_cleaner_backward_log <- lm(life_expectancy ~
status + log1p(adult_mortality) + log1p(bmi) +
log1p(infant_deaths) + total_expenditure + diphtheria + hiv_aids +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + schooling,
data = life_clean1)
summary(small_cleaner_backward_log)##
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) +
## log1p(bmi) + log1p(infant_deaths) + total_expenditure + diphtheria +
## hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources +
## schooling, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.1168 -2.5378 -0.1681 2.3135 15.3633
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.91524 0.93765 61.766 < 2e-16 ***
## statusDeveloping -1.14888 0.24814 -4.630 3.84e-06 ***
## log1p(adult_mortality) -0.86561 0.08429 -10.269 < 2e-16 ***
## log1p(bmi) 0.24714 0.11845 2.086 0.0370 *
## log1p(infant_deaths) -0.68126 0.05959 -11.433 < 2e-16 ***
## total_expenditure 0.06821 0.03454 1.975 0.0484 *
## diphtheria 0.04212 0.00373 11.291 < 2e-16 ***
## hiv_aids -0.75770 0.02633 -28.776 < 2e-16 ***
## log1p(gdp) 0.47854 0.05400 8.863 < 2e-16 ***
## log1p(thinness_1_19_years) -1.04665 0.14609 -7.164 1.02e-12 ***
## income_composition_of_resources 7.86418 0.64373 12.217 < 2e-16 ***
## schooling 0.61234 0.04480 13.667 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.85 on 2513 degrees of freedom
## Multiple R-squared: 0.7938, Adjusted R-squared: 0.7929
## F-statistic: 879.7 on 11 and 2513 DF, p-value: < 2.2e-16
## [1] 3.985237
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## studentized Breusch-Pagan test
##
## data: small_cleaner_backward_log
## BP = 218.74, df = 11, p-value < 2.2e-16
small_cleaner_backward_log_log <- lm(log1p(life_expectancy) ~
status + log1p(adult_mortality) + log1p(bmi) +
log1p(under_five_deaths) + total_expenditure + diphtheria + hiv_aids +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + schooling,
data = life_clean1)
summary(small_cleaner_backward_log_log)##
## Call:
## lm(formula = log1p(life_expectancy) ~ status + log1p(adult_mortality) +
## log1p(bmi) + log1p(under_five_deaths) + total_expenditure +
## diphtheria + hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) +
## income_composition_of_resources + schooling, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.219788 -0.034180 -0.000628 0.033872 0.219947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.072e+00 1.365e-02 298.391 < 2e-16 ***
## statusDeveloping -8.827e-03 3.589e-03 -2.459 0.0140 *
## log1p(adult_mortality) -1.200e-02 1.220e-03 -9.829 < 2e-16 ***
## log1p(bmi) 4.242e-03 1.715e-03 2.473 0.0135 *
## log1p(under_five_deaths) -1.106e-02 8.244e-04 -13.419 < 2e-16 ***
## total_expenditure 8.463e-04 4.999e-04 1.693 0.0906 .
## diphtheria 6.518e-04 5.403e-05 12.065 < 2e-16 ***
## hiv_aids -1.196e-02 3.817e-04 -31.339 < 2e-16 ***
## log1p(gdp) 6.422e-03 7.822e-04 8.210 3.50e-16 ***
## log1p(thinness_1_19_years) -1.281e-02 2.115e-03 -6.058 1.58e-09 ***
## income_composition_of_resources 1.154e-01 9.316e-03 12.389 < 2e-16 ***
## schooling 8.779e-03 6.506e-04 13.493 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05572 on 2513 degrees of freedom
## Multiple R-squared: 0.7984, Adjusted R-squared: 0.7975
## F-statistic: 904.7 on 11 and 2513 DF, p-value: < 2.2e-16
## [1] 65.102
## [1] 1.545455
small_cleaner_backward_box_cox <- lm(life_expectancy ^ 2 ~
status + log1p(adult_mortality) + log1p(bmi) +
log1p(under_five_deaths) + total_expenditure + diphtheria + hiv_aids +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + schooling,
data = life_clean1)
library(pracma)## Warning: package 'pracma' was built under R version 4.0.2
##
## Attaching package: 'pracma'
## The following object is masked from 'package:purrr':
##
## cross
##
## Call:
## lm(formula = life_expectancy^2 ~ status + log1p(adult_mortality) +
## log1p(bmi) + log1p(under_five_deaths) + total_expenditure +
## diphtheria + hiv_aids + log1p(gdp) + log1p(thinness_1_19_years) +
## income_composition_of_resources + schooling, data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1534.41 -353.56 -38.63 316.95 2135.98
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3535.6599 130.7043 27.051 < 2e-16 ***
## statusDeveloping -243.9356 34.3816 -7.095 1.68e-12 ***
## log1p(adult_mortality) -120.5440 11.6899 -10.312 < 2e-16 ***
## log1p(bmi) 20.6667 16.4297 1.258 0.2086
## log1p(under_five_deaths) -91.7842 7.8964 -11.624 < 2e-16 ***
## total_expenditure 10.7517 4.7883 2.245 0.0248 *
## diphtheria 5.1823 0.5175 10.014 < 2e-16 ***
## hiv_aids -92.8286 3.6563 -25.389 < 2e-16 ***
## log1p(gdp) 66.9525 7.4928 8.936 < 2e-16 ***
## log1p(thinness_1_19_years) -156.7699 20.2606 -7.738 1.46e-14 ***
## income_composition_of_resources 1087.7867 89.2291 12.191 < 2e-16 ***
## schooling 79.7725 6.2318 12.801 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 533.7 on 2513 degrees of freedom
## Multiple R-squared: 0.7887, Adjusted R-squared: 0.7878
## F-statistic: 852.7 on 11 and 2513 DF, p-value: < 2.2e-16
## [1] 0
## [1] 4.246683
small_cleaner_backward_log_poly <- lm(life_expectancy ~
status + log1p(adult_mortality) + I(bmi ^ 2) * status +
log1p(under_five_deaths) + log1p(total_expenditure) +
diphtheria + I(diphtheria ^ 2) + hiv_aids + I(hiv_aids ^ 2) +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + I(income_composition_of_resources ^ 2) + I(income_composition_of_resources ^ 3) +
schooling + I(schooling ^ 2) + I(schooling ^ 3),
data = life_clean1)
summary(small_cleaner_backward_log_poly)##
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) +
## I(bmi^2) * status + log1p(under_five_deaths) + log1p(total_expenditure) +
## diphtheria + I(diphtheria^2) + hiv_aids + I(hiv_aids^2) +
## log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources +
## I(income_composition_of_resources^2) + I(income_composition_of_resources^3) +
## schooling + I(schooling^2) + I(schooling^3), data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.3279 -2.0054 -0.1119 1.8930 13.7345
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.770e+01 1.070e+00 63.282 < 2e-16 ***
## statusDeveloping -5.138e-01 4.934e-01 -1.041 0.29775
## log1p(adult_mortality) -6.216e-01 7.243e-02 -8.582 < 2e-16 ***
## I(bmi^2) -2.024e-04 1.247e-04 -1.623 0.10464
## log1p(under_five_deaths) -2.535e-01 5.186e-02 -4.889 1.08e-06 ***
## log1p(total_expenditure) 8.234e-01 1.851e-01 4.450 8.98e-06 ***
## diphtheria -4.274e-02 1.354e-02 -3.157 0.00161 **
## I(diphtheria^2) 6.426e-04 1.210e-04 5.311 1.19e-07 ***
## hiv_aids -1.298e+00 5.243e-02 -24.764 < 2e-16 ***
## I(hiv_aids^2) 3.105e-02 2.186e-03 14.207 < 2e-16 ***
## log1p(gdp) 9.685e-02 4.811e-02 2.013 0.04423 *
## log1p(thinness_1_19_years) -7.074e-01 1.333e-01 -5.306 1.22e-07 ***
## income_composition_of_resources -4.477e+01 4.172e+00 -10.731 < 2e-16 ***
## I(income_composition_of_resources^2) 1.052e+02 1.101e+01 9.558 < 2e-16 ***
## I(income_composition_of_resources^3) -4.710e+01 7.454e+00 -6.318 3.13e-10 ***
## schooling 2.354e-01 2.481e-01 0.949 0.34277
## I(schooling^2) -2.845e-04 2.799e-02 -0.010 0.99189
## I(schooling^3) -3.337e-04 8.956e-04 -0.373 0.70946
## statusDeveloping:I(bmi^2) 1.057e-04 1.435e-04 0.736 0.46156
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.269 on 2506 degrees of freedom
## Multiple R-squared: 0.8517, Adjusted R-squared: 0.8507
## F-statistic: 799.8 on 18 and 2506 DF, p-value: < 2.2e-16
## [1] 3.733912
##
## studentized Breusch-Pagan test
##
## data: small_cleaner_backward_log_poly
## BP = 179.56, df = 18, p-value < 2.2e-16
small_cleaner_backward_log_poly <- lm(life_expectancy ~
status + log1p(adult_mortality) + I(bmi ^ 2) * status +
log1p(infant_deaths) + I(infant_deaths ^ 2) + log1p(total_expenditure) + I(gdp ^ 2) +
diphtheria + I(diphtheria ^ 2) + hiv_aids + I(hiv_aids ^ 2) +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + I(income_composition_of_resources ^ 2) +
schooling + I(schooling ^ 2),
data = life_clean1)
summary(small_cleaner_backward_log_poly)##
## Call:
## lm(formula = life_expectancy ~ status + log1p(adult_mortality) +
## I(bmi^2) * status + log1p(infant_deaths) + I(infant_deaths^2) +
## log1p(total_expenditure) + I(gdp^2) + diphtheria + I(diphtheria^2) +
## hiv_aids + I(hiv_aids^2) + log1p(gdp) + log1p(thinness_1_19_years) +
## income_composition_of_resources + I(income_composition_of_resources^2) +
## schooling + I(schooling^2), data = life_clean1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.3087 -2.0397 -0.1989 1.9317 14.0502
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.523e+01 1.005e+00 64.897 < 2e-16 ***
## statusDeveloping -1.213e-01 4.967e-01 -0.244 0.80706
## log1p(adult_mortality) -5.972e-01 7.331e-02 -8.146 5.85e-16 ***
## I(bmi^2) -2.028e-04 1.261e-04 -1.608 0.10790
## log1p(infant_deaths) -3.246e-01 5.559e-02 -5.839 5.91e-09 ***
## I(infant_deaths^2) 1.264e-06 4.590e-07 2.755 0.00592 **
## log1p(total_expenditure) 7.334e-01 1.872e-01 3.919 9.15e-05 ***
## I(gdp^2) -1.190e-10 8.685e-11 -1.370 0.17074
## diphtheria -5.497e-02 1.372e-02 -4.006 6.36e-05 ***
## I(diphtheria^2) 7.948e-04 1.222e-04 6.502 9.53e-11 ***
## hiv_aids -1.373e+00 5.191e-02 -26.456 < 2e-16 ***
## I(hiv_aids^2) 3.386e-02 2.181e-03 15.528 < 2e-16 ***
## log1p(gdp) 1.465e-01 5.066e-02 2.892 0.00386 **
## log1p(thinness_1_19_years) -7.924e-01 1.367e-01 -5.798 7.54e-09 ***
## income_composition_of_resources -2.288e+01 1.661e+00 -13.777 < 2e-16 ***
## I(income_composition_of_resources^2) 4.003e+01 2.073e+00 19.304 < 2e-16 ***
## schooling 6.079e-01 1.070e-01 5.683 1.48e-08 ***
## I(schooling^2) -2.308e-02 5.271e-03 -4.379 1.24e-05 ***
## statusDeveloping:I(bmi^2) 1.988e-04 1.444e-04 1.376 0.16891
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.312 on 2506 degrees of freedom
## Multiple R-squared: 0.8478, Adjusted R-squared: 0.8467
## F-statistic: 775.8 on 18 and 2506 DF, p-value: < 2.2e-16
## [1] 3.828813
##
## studentized Breusch-Pagan test
##
## data: small_cleaner_backward_log_poly
## BP = 199.8, df = 18, p-value < 2.2e-16
small_cleaner_backward_log_poly <- lm(life_expectancy ~
(status + log1p(adult_mortality) + I(bmi ^ 2) * status +
log1p(under_five_deaths) + log1p(total_expenditure) +
diphtheria + I(diphtheria ^ 2) + hiv_aids + I(hiv_aids ^ 2) +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + I(income_composition_of_resources ^ 2) + I(income_composition_of_resources ^ 3) +
schooling + I(schooling ^ 2) + I(schooling ^ 3)) ^ 2,
data = life_clean1)
# summary(small_cleaner_backward_log_poly)
par(mfrow = c(2, 2))
plot(small_cleaner_backward_log_poly, col="dodgerblue")## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in predict.lm(small_cleaner_backward_log_poly, newdata = le_tst_data):
## prediction from a rank-deficient fit may be misleading
## [1] 3.286007
##
## studentized Breusch-Pagan test
##
## data: small_cleaner_backward_log_poly
## BP = 342.24, df = 162, p-value = 5.792e-15
life_clean_glm <- life_clean1
life_clean_glm$life_expectancy_class <- ifelse(life_clean_glm$life_expectancy > 65, "hi", "low")
life_clean_glm$life_expectancy_class <- as.factor(life_clean_glm$life_expectancy)
le_tst_data$life_expectancy_class <- ifelse(le_tst_data$life_expectancy > 65, "hi", "low")
le_tst_data$life_expectancy_class <- as.factor(le_tst_data$life_expectancy_class)
small_cleaner_backward_log_poly_logit <- glm(life_expectancy_class ~
status + log1p(adult_mortality) + I(bmi ^ 2) * status +
log1p(under_five_deaths) + log1p(total_expenditure) +
diphtheria + I(diphtheria ^ 2) + hiv_aids + I(hiv_aids ^ 2) +
log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources + I(income_composition_of_resources ^ 2) +
schooling + I(schooling ^ 2),
data = life_clean_glm, family = "binomial")## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = life_expectancy_class ~ status + log1p(adult_mortality) +
## I(bmi^2) * status + log1p(under_five_deaths) + log1p(total_expenditure) +
## diphtheria + I(diphtheria^2) + hiv_aids + I(hiv_aids^2) +
## log1p(gdp) + log1p(thinness_1_19_years) + income_composition_of_resources +
## I(income_composition_of_resources^2) + schooling + I(schooling^2),
## family = "binomial", data = life_clean_glm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.03606 0.00003 0.00061 0.00929 0.73038
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.201e+01 4.412e+03 0.012 0.9906
## statusDeveloping -3.768e+00 4.412e+03 -0.001 0.9993
## log1p(adult_mortality) -6.510e-01 6.770e-01 -0.962 0.3362
## I(bmi^2) -3.904e-05 1.386e+00 0.000 1.0000
## log1p(under_five_deaths) -1.065e+00 5.876e-01 -1.813 0.0698 .
## log1p(total_expenditure) -2.014e+00 1.921e+00 -1.048 0.2945
## diphtheria -7.891e-01 5.971e-01 -1.321 0.1863
## I(diphtheria^2) 5.199e-03 3.906e-03 1.331 0.1832
## hiv_aids -3.548e-01 3.261e-01 -1.088 0.2766
## I(hiv_aids^2) 1.579e-02 1.888e-02 0.836 0.4032
## log1p(gdp) -2.296e-01 4.611e-01 -0.498 0.6185
## log1p(thinness_1_19_years) -8.196e-01 1.060e+00 -0.773 0.4396
## income_composition_of_resources 1.189e-01 1.192e+01 0.010 0.9920
## I(income_composition_of_resources^2) 2.001e+01 2.470e+01 0.810 0.4178
## schooling -2.479e-01 1.255e+00 -0.198 0.8434
## I(schooling^2) 5.314e-04 7.500e-02 0.007 0.9943
## statusDeveloping:I(bmi^2) -6.544e-05 1.386e+00 0.000 1.0000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 84.493 on 2524 degrees of freedom
## Residual deviance: 50.729 on 2508 degrees of freedom
## AIC: 84.729
##
## Number of Fisher Scoring iterations: 21
fit <- lm(life_expectancy ~ ., data = subset(le_trn_data, select = -c(year, continent, status, country)))
summary(fit)##
## Call:
## lm(formula = life_expectancy ~ ., data = subset(le_trn_data,
## select = -c(year, continent, status, country)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.0871 -2.0703 -0.1235 1.9829 14.3539
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.735e+01 5.817e-01 98.591 < 2e-16 ***
## adult_mortality -1.490e-02 7.896e-04 -18.874 < 2e-16 ***
## infant_deaths 4.676e-02 8.539e-03 5.476 4.76e-08 ***
## alcohol 8.362e-02 2.716e-02 3.078 0.002102 **
## percentage_expenditure 2.592e-04 7.363e-05 3.521 0.000437 ***
## hepatitis_b -1.560e-03 3.867e-03 -0.404 0.686613
## measles -1.036e-05 7.342e-06 -1.411 0.158476
## bmi 5.916e-03 5.226e-03 1.132 0.257779
## under_five_deaths -3.592e-02 6.232e-03 -5.765 9.14e-09 ***
## polio 2.207e-02 4.421e-03 4.993 6.34e-07 ***
## total_expenditure 3.291e-02 3.321e-02 0.991 0.321744
## diphtheria 2.471e-02 4.743e-03 5.209 2.05e-07 ***
## hiv_aids -3.662e-01 1.755e-02 -20.865 < 2e-16 ***
## gdp 2.949e-05 1.146e-05 2.574 0.010119 *
## population -2.744e-10 1.773e-09 -0.155 0.877014
## thinness_1_19_years -3.122e-02 4.779e-02 -0.653 0.513684
## thinness_5_9_years -7.835e-02 4.645e-02 -1.687 0.091780 .
## income_composition_of_resources 6.053e+00 6.126e-01 9.881 < 2e-16 ***
## schooling 6.489e-01 4.261e-02 15.230 < 2e-16 ***
## regionEurope & Central Asia 1.750e-01 2.845e-01 0.615 0.538452
## regionLatin America & Caribbean 8.310e-01 2.808e-01 2.959 0.003112 **
## regionMiddle East & North Africa 8.858e-01 3.152e-01 2.810 0.004985 **
## regionNorth America 7.232e-01 7.879e-01 0.918 0.358733
## regionSouth Asia 9.730e-01 5.291e-01 1.839 0.066026 .
## regionSub-Saharan Africa -4.609e+00 3.007e-01 -15.328 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.689 on 2610 degrees of freedom
## Multiple R-squared: 0.8509, Adjusted R-squared: 0.8495
## F-statistic: 620.5 on 24 and 2610 DF, p-value: < 2.2e-16